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Abstract 

Networks of person-person contacts form the substrate along which infectious diseases spread. Most 
network-based studies of the spread focus on the impact of variations in degree (the number of contacts 
an individual has). However, other effects such as clustering, variations in infectiousness or susceptibil- 
ity, or variations in closeness of contacts may play a significant role. We develop analytic techniques 
to predict how these effects alter the growth rate, probability, and size of epidemics and validate the 
predictions with a realistic social network. We find that (for given degree distribution and average trans- 
missibility) clustering is the dominant factor controlling the growth rate, heterogeneity in infectiousness 
is the dominant factor controlling the probability of an epidemic, and heterogeneity in susceptibility is 
the dominant factor controlling the size of an epidemic. Edge weights (measuring closeness or duration 
of contacts) have impact only if correlations exist between different edges. Combined, these effects can 
play a minor role in reinforcing one another, with the impact of clustering largest when the population 
is maximally heterogeneous or if the closer contacts are also strongly clustered. Our most significant 
contribution is a systematic way to address clustering in infectious disease models, and our results have 
a number of implications for the design of interventions. 

1 Introduction 

Recently H5N1 avian influenza and SARS have raised the profile of emerging infectious diseases. Both can 
infect humans, but have a primary animal host. Typically such zoonotic diseases emerge periodically into 
the human population and disappear (e.^., Ebola, Hanta Virus, and Rabies), but sometimes (e.^., HIV) the 
disease achieves sustained person-to-person spread. With the advent of modern transportation networks, 
diseases that formerly emerged in isolated villages and died out without further spread may now spread 
worldwide. 

A number of interventions are available to control emerging diseases, each with distinct costs and benefits. 
To design optimal policies, we must address several related, but nevertheless distinct, questions. How fast 
would an epidemic spread? How likely is a single introduced infection to result in an epidemic? How many 
people would an epidemic infect? We quantify these using T^o, the basic reproductive ratio^ which measures 
the average number of new cases each infection causes early in the outbreak; V, the probability that a single 
infection sparks an epidemic; and the attack rate or fraction of the population infected in an epidemic. 
Understanding these different quantities and what affects them helps us to select policies with maximal 
impact for given cost. 

Many different models are used to study disease spread. Perhaps the most important decision in de- 
veloping a model is how the interactions of the population are represented. Because of the complexity of 
the population, it is invariably necessary to make simplifying assumptions. The errors (and therefore, the 
conclusions) resulting from many of these approximations are not well-quantified. In this paper we will 
focus on quantifying the impact of clustering (the tendency to interact in small groups) and individual-scale 
heterogeneity on the spread of an epidemic. 

Based on how they handle clustering, models for population structure fit into a hierarchy of three classes 
(which in turn may be subdivided). At the simplest level the population is assumed to mix without any 
clustering. Most existing models fall into this category. At the most complex level, agent-based models are 
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used: the movements of each individual are tracked, and people who are in the same location are able to 
infect one another. These models typically require significant resources to develop, and the clustering is 
explicitly included. An intermediate level of complexity attempts to introduce the clustering as a parameter 
(or several parameters). Usually these models only consider clustering in terms of the number of triangles 
in a network, but as we shall see, other structures may play a role. 

Before introducing the details of our model, we review some previous work. All the models we consider 
are Susceptible-Infected-Recovered (SIR) epidemic models [2], in which individuals begin susceptible, become 
infected by contacting infected individuals, and finally recover with immunity. 

For unclustered populations, ordinary differential equation (ODE) models were among the earliest models 
used [24] and remain the most common. They are deterministic, and so cannot directly calculate 7^, but they 
give insight into the factors controlling IZo and A. Because they assume mass-action mixing, it is difficult to 
incorporate individual heterogeneity in the number of contacts. More recently some network-based models 
have been introduced for unclustered populations [H [36l [23l [32l [29l [30l [3T] . These models represent the pop- 
ulation as nodes with edges between nodes representing contacts, along which disease spreads stochastically. 
Heterogeneity in the number of contacts is introduced by modifying the degree (number of edges) of each 
node. By neglecting clustering, these studies are able to make analytic predictions through branching process 
arguments. A recent sociological study [35j used surveys with participants recording the length and nature 
of their contacts. This data is valuable for providing the contact distribution needed for the above network 
models, and allows us to apply network results to real populations. However, this data does not directly tell 
us anything about the clustering of the population resulting from family /work/other groups. Other recent 
work by [32l [23] analytically addresses the impact of heterogeneity in infectiousness and susceptibility in 
unclustered networks. 

Using agent-based simulations [HI O [TTl [171 [iHl [l] allows us to directly incorporate clustering. In these 
simulations, the population is a collection of individuals who move and contact one another. The modeller 
has complete control over the parameters governing interactions and how the disease spreads. This allows 
us to study many effects, but also introduces many parameters. It is difficult to test the accuracy of the 
assumptions used to generate these models and to extract which parameters are essential to the disease 
dynamics. The expense of developing these simulations is frequently prohibitive. 

In this paper we introduce a systematic approach for calculating the impact of clustering, and quantifying 
the error. Because our model investigates disease spread in clustered networks, we provide a more detailed 
review of previous work on clustering and disease. A few investigations have been made into the interaction of 
clustering with disease spread using network models. The attempts that have been made [2H [Mf [37 1 [4Ql [4 H [8] 
typically use approximations whose errors are not quantified, resulting in apparently contradictory results. 
A few papers [33l [42l [25] have considered clustering and heterogeneities, rigorously showing that increased 
heterogeneity tends to decrease V and but without quantitative predictions. Recently [14J considered the 
spread of epidemics in a class of random networks for which the number of triangles could be controlled. It 
may be inferred from their figure 3 that clustering decreases the growth rate and that sufficient clustering 
can increase the epidemic threshold. However, at small and moderate levels clustering appears not to 
alter the final size of epidemics significantly. Similar observations have been made by At first glance, 
this contradicts observations of [iQl [H] that clustering significantly reduces the size of epidemics, but that 
sufficiently strong clustering reduces the epidemic threshold (see also [37]), allowing epidemics at lower 
transmissibility. The discrepancy in epidemic size may be resolved by noting that the networks in [40l [4T] 
have low average degree. We will see that clustering only affects the size if the typical degree is small 
or clustering is very high. The apparent discrepancy in epidemic threshold with strong clustering may 
be resolved by noting that the form of strong clustering considered by [40l [41] forces preferential contacts 
between high degree nodes. The reduction in epidemic threshold is perhaps better understood in terms of 
degree-degree correlations than in terms of clustering. 

In this paper we develop techniques to incorporate general small-scale structure (beyond triangles) into 
the calculation of IZq^ 7^, and A. To calculate T^o, we develop a systematic series expansion which allows 
us to interpolate between unclustered and clustered results by including more terms. To calculate V and A^ 
we use a similar approach, but only give estimates on the size of correction terms. Our methods give us a 
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Figure 1: A sample network and several stages of an outbreak. Nodes begin susceptible (small circles), 
become infected (empty large circles), possibly infecting others along edges, and then recover (solid large 
circles). The outbreak finishes when no infected nodes remain. 

rigorous means to understand how unclustered results relate to more realistic populations, and our results 
resolve the apparent discrepancies mentioned above. Our theory accurately predicts epidemic behaviour in a 
more realistic contact network derived from an agent-based simulation of Portland, Oregon by EpiSimS [TT]. 
We expand this to investigate the interplay of clustering, heterogeneities in individual infectiousness or 
susceptibility, and variation in edge weights in their effect on IZq^ 7^, and A. 

The paper is organised as follows: Section [2] describes our model and networks and summarises earlier 
work on unclustered networks. These results will be the leading order terms for our expansions for clustered 
networks in the remainder of the paper. Section [3] considers how epidemics spread in a clustered network 
assuming homogeneous transmission. We derive the corrections to TZq and show that the corrections to V and 
A are insignificant unless the typical degree is small or clustering very high. Section [4] considers epidemics 
in clustered networks with heterogeneous infectiousness or susceptibility, building on section [3l Section [5] 
extends this further to consider epidemics spreading on clustered networks with weighted edges. Edges with 
large weights tend to occur in family or work groups which magnifies the impact of clustering. Finally 
section [6] discusses the implications of our results, particularly for designing interventions. We conclude that 
in general, heterogeneity significantly impacts V and A^ but not T^o, while clustering impacts TZq significantly, 
but not V and A. Heterogeneity or edge weights may enhance the impact of clustering. 

2 Formulation 

2.1 The disease model 

We consider the spread of a disease using a discrete SIR model on a static network G. Nodes of G represent 
individuals and edges represent (potentially infectious) contacts. The contact structure of the network is fixed 
during the course of the outbreak. The degree /c of a node u is the number of edges containing u. Figure [T] 
shows a sample outbreak. A single infection, the index case is chosen uniformly from the population to 
begin an outbreak. Infection spreads along an edge from an infected node to a susceptible node v with 
probability Tuv^ the transmissibility. The time it takes for infection and recovery to occur may vary but does 
not affect our results. Once u recovers it cannot be reinfected. Typically for a large random network with 
a population of = \G\ nodes, the final size of outbreaks is either large, with 0{N) cumulative infections, 
or small, with O (log TV) infections [7J. Large outbreaks are epidemics and small outbreaks are non- epidemic 
outbreaks. 

2.1.1 Transmissibility 

A number of factors influence the transmissibility from u to v such as the viral load and duration of infection 
of ix, the vaccination history and general health of the duration and nature of the contact between u and 
and characteristics of the disease. 

For each node u we denote its ability to infect others by Xu and its ability to be infected by Su- Each 
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edge has a weight Wuv The parameter a measures disease-specific quantities. In most of our calculations 
we assume these are scalars and follow [32] |T0] , setting 



— aXuSvWu' 



(1) 



If all contacts are identical, Wuv may be absorbed into a 



T{Iu,Sy) = 1 - e 



(2) 



Note that Tuv is a number assigned to an edge, while T{Xu^Sy) is a function which states what the trans- 
missibility between two nodes would be if they shared an edge. 

With mild abuse of notation we denote the probability density functions (pdfs) of X, tS, and w by 
P(X), and P{w) respectively. We assign X and S independently, but allow w to be assigned either 

independently or based on observed contacts {i.e., by observing contacts in a population we may create a 
static network with edge weights assigned based on the observed contact). If w is assigned independently, then 
it is possible to eliminate edge weights from the analysis by marginalising over the distribution of weights. 
However, if weights are not independent (for example work or family contacts tend to have correlated weights) 
then the details of the distribution and the correlations are important. 

Given the infectiousness Xu of node we follow [32l |32 and define its out-transmissibility 



This is the marginalised probability that u infects a randomly chosen neighbour given X^. From the definition 
of Tout and the pdf P{X) we can calculate the pdf Qout{Tout)- We symmetrically define the in-transmissibility 
and its pdf Qin{Tin)- 

We denote the average of a quantity by (•). The average transmissibility (T) is 



2.1.2 Epidemic percolation networks 

Rather than studying outbreaks as dynamic processes on networks, we may consider them in the context of 
Epidemic Percolation Networks (EPNs) [23l [33] . The EPN framework allows us to study epidemics as 
static objects and is useful for quickly estimating V, A, and TZq. In this section we summarise properties of 
EPNs; more details are provided in [3l [32] and [Al 

Once the properties of the nodes and edges are assigned, an EPN E is created as follows: We place each 
node of G into E. For each edge {u, v} in G we place directed edges (ix, v) and {v, u) into E independently 
with probability Tuv and Tyu respectively. The nodes infected in an outbreak correspond exactly to those 
nodes that may be reached from the index case following edges of E. More specifically, the distribution of 
out-components of a node u in different EPN realisations matches the distribution of outbreaks resulting 
from different epidemic realisations in the original model with u as the index case. It may be shown that the 
distributions of out- and in-component sizes give us information about the probability of nodes to start an 
epidemic or become infected in an epidemic. We will see that in a large population the structure of a single 
EPN can be used to accurately estimate 7^, ^, and IZq. 

Once we create an EPN and choose the index case, we define the rank of node v as the length of the 
shortest directed path from the index case to If no such path exists, v is never infected. 

Interchanging all arrow directions interchanges V and A. This means that if we can calculate 7^, then 
A may be calculated by the same technique, but with the direction of infection reversed. Because of this, 
we focus our attention on calculating V, and apply the same methodology to calculate A. An important 
consequence is that if T is constant, then V = A [36| [32]. 

■"^We follow |26) in using the term rank rather than generation which has been used elsewhere, but is potentially ambiguous. 
The rank is the shortest number of infectious contacts between the index case and a node. It is possible that a different path 
takes less time. The path infection actually follows is the path that is shorter in time, rather than number of links. 




(3) 




(4) 
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2.1.3 The basic reproductive ratio 

We expect that epidemics are possible if and only if the basic reproductive ratio IZq is greater than 1. That 
is, if an average infection causes more than one new case, an epidemic may occur, but otherwise the outbreak 
dies out quickly. However, this use of IZq is not consistent with the typical definition: the average number 
of new infections caused by a single infected individual introduced into a fully susceptible population^ which 
gives IZq = (T) {k). A more appropriate definition is the average number of new infections caused by infected 
individuals early in outbreaks. The distinction is subtle, but results from the fact that whether an outbreak 
can grow depends on whether the people of low rank infect more than one person each [12]. Low rank 
individuals may be different from the average individual. Most obviously, they have more contacts [36[ 116]: 
but with clustering, they also have a disproportionately large fraction of neighbours infected or recovered. 

In order to quantify IZo more rigorously, we first define Nr to be the number of people of rank r for a 
given outbreak simulation. We then define the rank reproductive ratio 

to be the expected number of new cases caused by a rank r node (averaged over all possible outbreak 
realisations). 7^o,o = {T) {k) corresponds to the usual definition of TZq. In practise, we find that 7^o,r 
reaches a plateau quickly as r increases before eventually decreasing as the finite size of the population 
becomes important. Consequently, an improved definition of 1Zo is the limit of 1Zo,r as r grows, subject to 
the assumption that 1Zo,r is unaffected by the finite size of G. This gives {cf [42] ) 

IZo — lim lim 7^o,r • (6) 

and generalises the definition given by [TJ for ODE models. Under this definition, epidemics are possible if 
IZo > 1, but not if IZo < 1. We discuss this further in[Bl In a large population considering multiple index 
cases with a single EPN gives a good estimate of E[Nr] and hence 7^o,r- 

2.2 Configuration Model Networks 

We consider two different types of networks. The first is a class of (unclustered) random networks for which 
we can derive analytic results based only on the degree distribution. These analytic results will form the 
leading order term of our perturbation expansions. The second is a more complicated network resulting from 
an agent-based simulation, which we will use to demonstrate the accuracy of our perturbation expansions. 

Our random networks are created by an algorithm which has been discovered independently a number 
of times (see e.g.^ [34] and [6]). These have come to be called Configuration Model (CM) [38] networks. 
These networks are maximally random given the degree distribution. As the number of nodes in a CM 
network grows, the frequency of short cycles becomes negligible. The resulting lack of clustering allows us 
to calculate analytic results for epidemics. We briefly discuss these results assuming T is constant. More 
details are in ^ ^6, 30, |32l |39^ 23\ and O (which also addresses edge weights). 

In the early stages of an outbreak in a CM network, the probability that a newly infected (non-index 
case) node has degree k is kP{k)/ {k). Clustering is unimportant and so the node will have k — 1 susceptible 
neighbours, regardless of its rank. Thus the expected number of infections caused by a newly infected node 
is 

(k^ -k) 

7^o=T^. (7) 

To calculate the probability V that infection of a randomly chosen index case results in an epidemic, we 
instead calculate the probability f = 1 — V that it does not. Then / is the probability that each neighbour 
of the index case either is not infected, or is infected but does not start an epidemic. Defining h to be the 
probability that a secondary case does not start an epidemic, 

f = Y,P{k)[l-T + Th]K (8) 
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We find a similar relation for /i, except that the probability for a secondary case to have degree k is kP{k)/ (k) 
and only k — 1 neighbours are susceptible 

h=^^Y.kP{k)[l-T + Thf-K (9) 

We solve this recurrence relation for h numerically, and use the result to find /. V follows immediately. 
Because T is constant, this also gives A [36l [32] . 

If T is not constant, the calculation becomes more difficult, and is discussed further in [Cl and [23l|32]. In 
general, if T can vary for CM networks, TZq = (T) (k'^ — k) / (/c), while V and A are overestimated by the 
values calculated assuming constant T. 

2.3 The EpiSimS Network 

We are interested in understanding the impact of clustering on disease spread. The term clustering is rather 
vague, and is usually measured by the number of triangles in a network [43j. However, any sufficiently short 
cycles impact the spread of an infectious disease. For our purposes we think of a clustered network as a 
network with enough short cycles to impact disease dynamics. 

It is relatively simple to measure the degree distribution of a population using survey methods. We can 
easily calculate 7^, A^ and IZq for a CM network with the same degree distribution, but the error between 
these values and the values for the original clustered network are unknown. Our goal in this paper is to 
develop analytical techniques to quantify these errors. 

To test our predictions we turn to an agent-based network derived from a single EpiSimS [TTl [151 IS] 
simulation of Portland, Oregon. The simulation includes roads, buildings, and a statistically accurate (based 
on Census data) population of approximately 1.6 million people who perform daily tasks based on popula- 
tion surveys. This gives a highly detailed knowledge of the interactions in the synthetic population. The 
degree distribution and contact structure emerge from the simulation. The resulting network has significant 
clustering and average degree of about 16. More details are in[Dl 

3 Clustered networks with homogeneous nodes 

In this section we assume that the population is homogeneous and all contacts are equally weighted. Con- 
sequently transmissibility is constant: T^^ = T for all edges. It follows that V = A [36l[32]. We develop a 
predictive theory for 7^, A^ and IZq and test the theory with simulations on the EpiSimS network. We begin 
with IZq. 

3.1 The basic reproductive ratio 

The simulated rank reproductive ratio 7^o,r is shown in figure[2]for < r < 4. At all values of T, 1Zo,o = T {k) 
is clearly distinct from 7^o,r, r > (which are close together). For r > 0, 7^o,r is asymptotic to the unclustered 
approximation T (k!^ ~ / (k) as T ^ 0. This is because at small T the disease only rarely follows all edges 
of short cycles and so clustering has no impact. As T increases, these curves lie significantly below the 
unclustered approximation, because clustering reduces the number of available susceptibles. 7^o,4 peels away 
from 7^0,1, '^0,2, and 7^o,3 for larger T because the population is finite, and so the number of susceptibles 
available to infect after rank four is reduced. In larger populations, 7^o,4 would not deviate. 

We conclude that 7^o,r converges quickly, and that 7^o,i is a good approximation to TZq^ but 7^o,o is 
not. This implies that the network has important structure contained in paths of length 2, but not in paths 
of length 3. This fortunate observation allows us to approximate IZ0 by 7^o,i, which we may analytically 
calculate with relative ease (7^o,r becomes combinatorially hard as r grows). To find 7^o,i = IE[A^2]/E[7Vi] we 
first note that E[Ni] = T (k). Calculating E[A^2] is more difficult: consider all pairs of nodes u and v with 
at least one path of length 2 between them. Let Uuv be the number of paths of length 2 between u and v 
and Xuv be an indicator function: Xuv = 1 if {u^v} is an edge and Xuv = if it is not (see figure [3|). The 
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Figure 2: Simulated values of the rank reproductive ratio 7^o,r = IE[A^r+i]/E[A/'^] for r = 0, . . . , 4 using 
an EPN from the (fixed) EpiSimS network with a homogeneous population, compared with the unclustered 
prediction. At small T (right panel) 7^o,i-'^o,4 match the unclustered prediction. 





4, X' 



4, 



Figure 3: Different options for paths of length two between nodes u and v. 



probability that an infection of u results in infection of v in exactly two steps is [1 - (1 - T2)^-][1 - T]^--. 
Summing this over all pairs yields 



(where N is the size of the population and each pair u and v appears twice) which allows us to calculate 
7^0,1 exactly. This sum is straightforward to calculate, but we can increase our understanding with a small 
T expansion. We approximate E[A^2] for T <C 1 by 



= -k)- 2T^ (nA) - (nn) + 0{T') , 



where (ua) = jj^ Yjv^u ^uvXuv is the average number of triangles each node is in, and (nn) = ^ ^v^u C'T 
is the average number of squares each node is in (c/, [20j). Higher order terms involve more complicated 
shapes. This gives 



(10) 



(fc) (k) (k) 

At leading order we recover the unclustered prediction for IZq^ refiecting the fact that at small T the 
probability the outbreak follows all edges of a cycle is negligible. As T increases, the first corrections are 
due to triangles, then squares, then pairs of triangles sharing an edge, and sequentially larger and larger 
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Figure 4: Comparison of first three asymptotic approximations for 7^o,i from equation (p!Q|) with the exact 
value (solid) for the EpiSimS network. The right panel shows the comparison at small T. 



structures made up of paths of length two. A comparison of these approximations with the exact value is 
shown in figured! 

Although we have defined IZq for an ensemble of realisations, figure [5] shows that 7^o,i accurately predicts 
the observed ratio Nr-\-i/Nr for individual simulations once the outbreaks are well-established. Early in 
outbreaks, the behaviour is dominated by stochastic effects, and so the ratio of successive rank sizes is noisy. 
Once the outbreak has grown large enough, random events become unimportant and the ratio settles at 




5 10 15 20 25 30 35 40 5 10 15 20 25 

rank rank 



Figure 5: The progression of ten simulated epidemics for (left) T = 0.1 and (right) T = 0.2 in the EpiSimS 
network. The left panels show A^+i/A^r against rank and right panels show the cumulative fraction of the 
population infected. 
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Figure 6: Probability V and attack rate A of epidemics for the (clustered) EpiSimS network (+) versus T, 
compared to the prediction derived from the degree distribution assuming no clustering. Each data point is 
from a single EPN, (the variation in V resulting from different EPNs is negligible). 



3.2 Epidemic probability and size 

In order to assess the effect of clustering on V and we compare epidemics on the EpiSimS network with 
the analytic predictions derived assuming a CM network of the same degree distribution in figure [6l The 
epidemic threshold is not noticeably altered, and the values of V and A are almost indistinguishable from 
the predictions made assuming no clustering, despite the large amount of clustering in the network. 

Although initially surprising, these results may be understood intuitively as follows: if T is large enough 
that the disease follows all edges of a short cycle then some other edge from a node of that cycle is likely to 
start an epidemic and the cycle does not prevent an epidemic. On the other hand, if T is smaller so that it 
does not follow all edges of a cycle, then the disease never sees the existence of the cycle, and the outbreak 
progresses as if there were no cycle. 

To make this more rigorous, we first look at the epidemic threshold. We assume IZq is well- approximated 
by 7^0,1- Let Tq = {k) / (k!^ — /c) be the threshold without clustering and Tq + 5T be the threshold found by 
including the correction due to triangles. From equation (p!Q|) it follows that 



5T 



2{n^){k) 
- kf 



O 



2{nA){k) 
- kf 



(11) 



Because a given node of degree k is contained in at most {k'^ — k)/2 triangles, we conclude 2 (tia) / {k'^ — k) < 
1. So if {k) I (k!^ — k) is small the leading order term of equation (pTj) is small and triangles do not significantly 



^ Early noise controls how quickly outbreaks become epidemics, and so once stochastic effects become small, the curves 
appear to be translations in time. We note that it is common to consider the temporal average of a number of outbreaks. 
However, prior to taking an average, the curves should be shifted in time so that they coincide once the stochastic effects are 
no longer important. Failure to do so underestimates the early growth, peak incidence, and late decay while it overestimates 
the epidemic duration. This can lead to an incorrect understanding of "typical" outbreaks. 
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alter the epidemic threshold regardless of the density of triangles. For the EpiSimS network, (k) / (k'^ — k) 
takes the value 0.046, and so we do not anticipate clustering to play an important role in determining the 
threshold. 

Above threshold, we assume that V may be expanded much like (p!Q|) 

V = Vo^Vi {ua) + V2 {riAf + • • • + Qi (nn) + • • • . (12) 

where Vq is the epidemic probability in a CM network of the same degree distribution. Although calculating 
7^0,1 only requires information about nodes of distance at most two from the index case, V may depend on 
effects occurring at larger distance, and so the expansion has many additional terms. In general, we expect 
that if the average degree is large, then the various coefficients of the correction terms are all small. The 
larger a structure is, the smaller we expect its corresponding coefficient to be. The coefficient for triangles 
Vi may be found by 

ueG AeG 

where pA {u) is the probability that a given triangle prevents an epidemic if u is the index case (regardless 
of whether u is part of the triangle). Reversing the order of summation we get 

^1 (^a) = ( ^a(^) \ = {tia) I Yl ^^(^) 

\ueG I A \ueG 

where Na is the number of triangles in G and (•)^ is the average of the given quantity taken over all 
triangles. Thus 

\ueG 

and we can find Vi by considering the average effect of a single triangle in an unclustered network. 

To calculate the impact of a triangle with nodes and wonV for a given network, we consider that 
triangle and a randomly chosen edge {x, y} elsewhere in the network. If we replace the edges {v^ w} and 
{x^y} with {v^x} and {w^y}^ then we have a new network without the triangle, but with the same degree 
distribution. We must estimate the expected change in V caused by switching the edges. 

We begin by assuming u is the index case. The triangle can affect V only if the infection tries to cross all 
three edges, that is, if the infection process 'loses' an edge because of clustering. This may happen in three 
distinct ways. In the first, node u infects both v and and then v and/or w tries to infect the other. In 
the second u infects v but not then v infects and finally w tries to infect u. The third is symmetric to 
the second (with u infecting w). 

To leading order we can ignore other short cycles, so the probability that an edge leading out of u (not to 
V or w) will not cause an epidemic is ^ = 1 — T + T/i, where h (as before) is the probability that a randomly 
chosen secondary case does not cause an epidemic in an unclustered network and can be calculated using 
equation ([9]). 

We perform a sample calculation with the first case: u infects both v and w. Assume that u has degree 
ku^ V has degree /c^, and w has degree k^. The probability that u infects both v and w without some other 
edge leading from or w starting an epidemic is 7"2^/c^+/c^+/e^-6^ j£ {v^w} edge were broken and v 
and w were joined to x and y respectively (see figure [7]), then the new probability of u to infect both v and w 
without an epidemic becomes 7^2^/c^+/c^+/c^-4 rpj^^ difference is 7^2^/c^+/c^+/c^-6j^-|^ _^2j^ which is the product 
of three terms, all at most 1. If the sum ku -\- ky -\- k^ is moderately large, then either ^^^+^^+^^-6 ^ 
or 1 — <C 1 (if ^ is not close to 1 then the first term is small, otherwise the second term is small). Thus 
the triangle has little impact on the epidemic probability in this caseH Similar analysis applies to the other 
two cases where the w to u or v to u infections are lost. Provided the typical sum of degrees of nodes in 

^If V is small, then the relative change may be large, but the absolute change is small. 
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Figure 7: Replacing the edges {v^w} and {x^y} with {v^x} and {w^y} breaks the triangle and allows more 
infections, without affecting the degree distribution. 

a triangle is relatively large, the probability of an epidemic when the index case is in the triangle is not 
impacted significantly. 

If the index case is not part of the triangle, then the above analysis is modified because we must also 
consider each node in the path from the index case to the triangle. We must first calculate the probability 
that infection reaches a node in the triangle while simultaneously no intermediate node sparks an epidemic, 
and then we calculate the probability as above that the triangle prevents an epidemic. If the index case is 
ui and the path from ui to the triangle goes through . . • , Un and then reaches then the probability 
that the triangle prevents an epidemic p{ui) is given by T'^ {g~'^^^^i^^i)p{u) . This falls off very quickly, 
and so nodes not in the triangle are unimportant, unless typical degrees are small. 

In contrast, in a network with small average degree and a significant number of triangles this becomes 
significant. This explains observations of [411 [40] who use networks with average degree less than 3 and find 
that clustering significantly alters A. 

It is tempting to generalise our conclusion and state that if the average degree is large, clustering has 
no impact on V or A. However, there are a number of counter-examples: consider a network made up of 
isolated cliques with Nc nodes, then in expansion ([T2|) the coefficient for cliques of Nc nodes will not be small. 
Consequently care must be taken when using such an expansion to ensure that neglected terms resulting 
from larger scale structures are in fact negligible. For social networks, we generally anticipate this highly 
segregated situation to be unimportant. 

We conclude that for most reasonable networks, clustering is only important for V and A if the typical 
degrees of nodes are low in which case IZq is small. A consequence of these results is that if IZq is moderately 
large, then V and A are effectively unaltered by clustering. If IZq is small, however, clustering may or may 
not play a role in determining V and A^ depending on whether IZq is small because the degrees are small or 
because T is small. 

4 Clustered networks with heterogeneous nodes 

When we drop the assumption of constant transmissibility, disease spread becomes more complicated. If X 
is heterogeneous and u infects a neighbour, then the a posteriori expectation for Tout{u) becomes higher: 
it is likely to infect more neighbours. This accentuates the effect of short cycles, enhancing the impact of 
clustering on IZq^ 7^, and A. A similar argument applies with heterogeneity in S: if is not infected by one 
of its neighbours, then the a posteriori expectation for Tin{v) becomes lower: it is less likely to be infected 
by other neighbours, and so has multiple opportunities to prevent an epidemic. Again this accentuates the 
effect of short cycles. 

In this section we investigate how varying the infectiousness and susceptibility of nodes in the EpiSimS 
network enables clustering to alter the values of V and A. We will make use of the ordering assumption and 
its consequences from [33]: if ui is "more infectious" than U2 in a given instance [or vi "more susceptible" 
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Table 1: For the calculations of sections [4] and [5] we determine Tuv using equations ([2]) and ([T]) with the 
distributions of X and S given in the first four rows, or by considering a maximally heterogeneous population 
for which (T) of the population infects all neighbours and 1 — (T) infect no neighbours. The function S is 
the Dirac delta function. 

than ^2], then ui is always more infectious than U2 [or vi always more susceptible than V2]. More specifically, 
the ordering assumption states that if Tout{ui) > Tout{u2)^ then T{Xu^,S) > T{Xu2^S) for all *S, and the 
corresponding statement for Tin. The results of show that if the ordering assumption holds, heterogeneity 
tends to reduce V and and the upper bounds on V and A correspond to homogeneous populations 
(constant T). 

For simulations in this section, we consider five different illustrative cases, which will be denoted through- 
out by the symbol given in table [H In the first four, we use equation ([2]) so that Tuv = 1 — e""-^^*^^ with the 
distribution of X and S varying for each. We vary a to change the average transmissibility. In the fifth case 
the out-transmissibility is maximally heterogeneous: A fraction (T) of the population infect all neighbours, 
while the remaining 1 — (T) infect no neighbours. 

The fifth case gives a lower bound on V for a homogeneously susceptible population [42] . It is hypothesised 
to remain a lower bound on V if susceptibility is allowed to vary [33]. We could also consider maximal 
heterogeneity in susceptibility, but the results for V and A merely correspond to interchanging their values 
for maximal heterogeneity in infectiousness, and so we do not need to consider it explicitly. 

4.1 The basic reproductive ratio 

We use simulations to calculate the rank reproductive ratio 7^o,r for the cases of table [Tj and plot the result 
for < r < 4 in figure [51 Note that 7^o,i remains a good approximation to TZq. In the first four cases, TZq is 
again again asymptotic to the unclustered approximation as (T) 0. There are small kinks for O and ■ 
at (T) = 0.5 and (T) = 0.7 respectively, resulting from the nature of those distributions. The heterogeneities 
act to enhance the effect of clustering on T^o, but the effect is relatively small. 

In the final, maximally heterogeneous case O, ^0,1 remains a good approximation to IZq. At small 
values of (T) , the heterogeneity causes clustering to have a larger impact than in a homogeneous population 
as seen in the lower right panel of figure [HI and so this is not asymptotic to the unclustered approximation. 
At larger values of (T) the heterogeneous and homogeneous growth rates are similar. 

As before, we can calculate 7^o,i analytically, which helps explain our observations. If the ordering 
assumption holds, we may use a simplified notation T(Tout-,Tin) to denote the transmissibility from a node 
with out-transmissibility Tout to a node with in-transmissibility TinEI We have E[7Vi] = (T) {k) and 

^[^2] = l^Y.Y. ToutT^nT-][l - T{TouuT,n)]''-Qout{Tout)Q^n{T^n)dToutdT,n 

= {e - k) {Tf - 2 (ha) {ToutTinTinut, T,„)) - (nn) {T^^,} {tI) + ■ ■ ■ , 
and so we may express the growth rate as a perturbation about the unclustered case IZq = (T) (k!^ — k) / {k) 

^We can use this notation because the ordering assumption allows us to uniquely identify X from Tout and S from Tin. If 
the ordering assumption fails, similar results hold, but the notation is more cumbersome. 
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Figure 8: 7^o,r = IE[A^r+i]/IE[A^r] calculated from simulations for the heterogeneous examples of table [H 
The final panel (lower right) compares 1Zo,i for all of the different cases, including both unclustered and 
homogeneous. 



giving 

For the second term, it may be shown that (T)^ < {ToutTinT{ToutiTin)) < {T)^ . The minimum occurs 
when T is constant, suggesting that the maximum growth rate occurs in a homogeneous population. The 
maximum (T)^ occurs either for 

Qout{Tout) = (1 - {T))5{Tout) + (T) 5{Tout - 1) , (14) 
that is, when the out-transmissibility is maximally heterogeneous, or when the in-transmissibility is maxi- 
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Figure 9: Comparison of V and A observed from simulations in the clustered EpiSimS network with 
heterogeneities (symbols) with that predicted by the unclustered theory (curves) using table [TJ Each data 
point is based on a single EPN. For both ■ and O Tin{v) = (T) for all nodes, and so the unclustered 
prediction for A is the same. 



mally heterogeneous: 



Qm(T,„) = (1 - {T))5{T,n) + (T) d{Tin " 1) . 



(15) 



Consequently, we expect that for given (T) the minimum growth rate occurs with maximally heterogeneous 
infectiousness or susceptibility. These two minima for 7^o,i have previously been hypothesised to give lower 
bounds on V and A respectively [33] . 

We note that in the maximally heterogeneous case, the correction term in (p!3|) is significant at leading 
order in T. Consequently, if (tt-a) is comparable to (/c^ — k) /2 (that is, the clustering coefficient |43j is 
comparable to 1), the threshold value of (T) may be increased by clustering, and TZq is not asymptotic to 
the unclustered prediction as (T) 0. 



4.2 Probability and size 

Figure [9] shows that the unclustered predictions provide a good estimate of V and A in the clustered EpiSimS 
network. We expect that in a network with sufficiently large average degree, the impact of clustering should 
once again be small. 

We use arguments similar to before, taking a triangle with nodes and w. The reasoning becomes 
more difficult because knowledge that u infects v may increase the expectation that u infects w. Consequently 
the lost edges in triangles are more frequently encountered by the outbreak. However, the knowledge that 
u infects v also increases the expectation that u infects its other neighbours. For a triangle to prevent an 
epidemic, we need both that no edge outside the triangle leads to an epidemic and that the lost edge would 
otherwise have caused an epidemic. If the typical degree of the network is not small, then the fact that the 
lost edge is encountered more frequently may be offset by the fact that when it is encountered, other edges 
are more likely to spark an epidemic. 

For O where nodes infect all or none of their neighbours, the effect of different triangles that share 
the index case cannot be separated as easily. The probability the index case directly infects a set of m 
nodes of interest is (T), rather than T^. Thus expansions as in (p!2|) do not work as well: terms that were 
previously higher order become significant. Close to the epidemic threshold, this can play an important role. 
However, well above the epidemic threshold, if the index case infects all of its neighbours, an epidemic is 
almost guaranteed and so V ^ (T) regardless of whether the network is clustered. Thus for O , clustering 
affects V only close to the epidemic threshold. 

In the opposite case where nodes would be infected by any neighbour or else no neighbour, the values of 
V and A are interchanged. Thus for maximally heterogeneous susceptibility V could be significantly altered 
close to the threshold. The reason for this is as follows: For the ffist step the spread is indistinguishable from 
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Figure 10: 7^o,r, ^, and A for the weighted EpiSimS network with a homogeneous population. 

that of an outbreak with constant T. However, when infections of rank 1 attempt to infect their neighbours, 
they cannot infect any of the neighbours of the index case. In contrast, in the constant T case, any neighbour 
not infected by the index case would be susceptible at later steps. Consequently, the impact of triangles 
becomes much more important (by a factor of 1/ (T)) and our earlier argument for neglecting them fails. 
The interaction of maximal heterogeneity with clustering in this case is larger, but it nevertheless becomes 
unimportant far from the threshold. 

Our prediction that heterogeneity allows clustering to be more significant close to the threshold is borne 
out for # where there is relatively strong heterogeneity in susceptibility just above the epidemic threshold. 
The epidemic threshold for O is increased compared to the other cases. In contrast there is much stronger 
heterogeneity in susceptibility for O at (T) = 0.5 and in infectiousness for ■ at (T) = 0.7. This results in 
a reduction in A and V respectively, but because it is far from threshold, there is little deviation from the 
unclustered predictions. 

5 Clustered networks with weighted edges 

When we allow edges to be weighted, new complications arise. The weights we use in our simulations are 
the durations of contacts from the EpiSimS simulation and are discussed in detail in [D1 If a contact in 
the original EpiSimS simulation is longer, a higher weight is assigned. If the weights of different edges 
were independent, then we could simply take Tuy = J T{Xu^Sy^ w)P{w) dw. However, edge weights are not 
independent: clustered connections tend to have larger weights. If brief contacts are negligible, the disease 
spreads on a subnetwork of the original network. The new network has a comparable number of short cycles 
to the original, but lower typical degree. This should enhance the impact of clustering. 

For our calculations in this section, we first isolate the impact of weighted edges by taking a homogeneous 
population (X = S = 1) and using Tuv = 1 — e~"^^^. We vary a in order to set (T). We then investigate a 
heterogeneous population using equation ([1]) with the first four distributions of table [TJ 

Results for a homogeneous population are shown in figure [TOl Because Tyy = Tyu for all pairs, it follows 
that V = A. If different edge weights were uncorrelated, then the value of IZo would match with figure [2] and 
V and A would match with figure [6l We see, however, that IZo is significantly reduced from the homogeneous 
unweighted population (but 7^o,i remains a good approximation). Close to the threshold V and A are mildly 
reduced. These observations are consistent with our expectation that clustering should be accentuated by 
incorporating edge weights. Although the predictions for V and A are not far off, we expect that they would 
improve if we adjusted the degree distribution to match that of the effective network on which the disease 
spreads. 

When the population is moderately heterogeneous (figure [TT]) . we still find that 7^o,i is a reasonable 
approximation to the true value of IZq^ however, it slightly underestimates IZq as (T) grows. Unfortunately 
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Figure 11: 7^o,r with heterogeneous transmissibihty and weighted edges on the EpiSimS network. 

the analytic calculation of 7^o,i is much more difficult, and so it is more appropriate to use simulations to 
estimate its value. If there were no correlation between weights of different edges, then the calculation would 
reduce to that of section [H 

We consider V and A in figure [121 The unclustered predictions are reasonable approximations of the 
actual values. The error is larger than before because we have combined two effects (edge weights and 
heterogeneity) that both accentuate the impact of clustering. In spite of this, the predicted values of V 
and A are not far off, and the direction of the error is consistent: the unclustered prediction is always an 
overestimate. 

6 Discussion 

We have investigated the interplay of clustering, node heterogeneity, and edge weights on the growth rate 
7^0, probability 7^, and size of epidemics A in social networks. For unclustered networks with independently 
distributed edge weights, it is possible to predict all these quantities analytically. Under weak assumptions 
we can accurately estimate IZq^V^ and A for clustered networks. 

If the typical degrees are not small, then for a given average transmissibihty and degree distribution: 

• The dominant effect controlling the growth rate of epidemics is clustering. Increased clustering reduces 
7^o. 

• The dominant effect controlling the probability of epidemics is heterogeneity in infectiousness. In- 
creased heterogeneity reduces V 
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• The dominant effect controlling the size of epidemics is heterogeneity in susceptibility. Increased 
heterogeneity reduces A 

We are thus able to neglect clustering and still closely estimate V based only on the degree distribution 
and the out-transmissibility pdf Qout- The estimate for A depends only on degree distribution and the 
in-transmissibility pdf Qin. The impact of clustering is significant in altering T^o, and its impact is mildly 
enhanced by heterogeneities. This enhancement occurs because the probability of following all edges of a 
cycle is increased if some of the edges are correlated due to the heterogeneity. If heterogeneity is large, 
clustering may play a small role in moving the epidemic threshold, but otherwise its effect on the threshold 
is negligible. In networks with small typical degree, it has been observed that clustering can modify V or 
A HlKlO], which is consistent with our estimates. 

If edge weights are included, but are independently distributed, then their impact is in modifying Qin{Tin) 
and Q outiTout) ' The resulting modification may be calculated explicitly, and edge weights have no further 
effect. If edge weights are correlated, they have a more important role in governing the behaviour of epidemics, 
particularly if higher weight edges tend to be the clustered edges (as frequently occurs in social networks). If 
this happens, then the impact of clustering is enhanced, and the growth rate of epidemics is further reduced. 

When we move from predicting V and A to predicting T^o, we find that the growth rate is well approxi- 
mated by 7^0,1 = IE[A/'2]/E[A/'i]. This may be calculated analytically in the homogeneous case (constant T). 
When heterogeneities are included, the calculation becomes harder, and when edge weights are included it 
becomes largely intractable. However, these are easily estimated through simulation. 

These observations show that using IZq to predict A will generally be inadequate. In a homogeneous but 
clustered population, IZq is reduced but A is unaffected, and so predictions of A based on IZq will be too 
small. In networks that are not clustered but have heterogeneities in susceptibility, IZq is unaffected but A 
is substantially reduced. Consequently, the value of A predicted from IZq will be too large. 

Perhaps our most important conclusion about clustering is that it plays an important role in altering the 
growth of an epidemic, but it only plays a small role in determining whether an epidemic may occur or how 
big it would be. If the relevant questions are, "how likely is an epidemic and how large would it be?" then 
the modeller may proceed ignoring clustering. If however, the question is "how fast will an epidemic grow?" 
then clustering must be considered, but only enough to calculate 7^o,i- 

Our results have implications for designing intervention strategies. A number of strategies are available to 
control epidemic spread, including travel restrictions, quarantines, and vaccination. Most of the mathemati- 
cal theory predicting the effects of these strategies has been developed under the assumption of no clustering. 
Most immediately, if we measure IZq = 2 at the early stages of an epidemic, traditional approaches will sug- 
gest that vaccinating just over half of the population will bring the epidemic below threshold. However, if 
the population is clustered, then the observed IZq was already affected by the fact that some transmission 
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chains were redundant. Following vaccination, some of these chains will no longer be redundant and the 
disease may still spread with IZo > 1. 

Achieving a better understanding of the effect of clustering further helps to guide our intuition when 
choosing between strategies. For example, let us assume that we have the choice between two strategies: in 
the first, we stagger work schedules in such a way that a typical person's contacts is reduced by 1/3; in the 
second, we implement population-wide behavior changes so that the same reduction in number of contacts 
is achieved, but the work contacts are unaltered. The first reduces clustering while the second increases the 
relative frequency of clustering. The value of IZq is much smaller in the second case than in the first because 
of the larger clustering, but V and A are reduced by a comparable amount in both cases. Which strategy is 
best depends on our goals and relative costs. 

Strategies that enhance heterogeneity in infectiousness or susceptibility can be important to help reduce 
V or even when there is little impact on IZq. Depending on which quantity we want to minimize, different 
choices will be optimal. Consider a choice between vaccinating all individuals with a vaccine that reduces 
Tuv by a factor of 1/2 for all pairs u and v or a contact tracing strategy that will remove 1/2 of all new 
infections before they have a chance to infect anyone. Both strategies reduce (T) by a half. However, the 
first reduces Tout uniformly, while the second increases heterogeneity in Tout- Thus if we have the choice of 
the two strategies, contact tracing is more likely to eliminate the disease before an epidemic can happen. 
If our choice is instead between a global vaccine reducing Tin by a factor of 1/2 for all individuals, or a 
completely effective vaccine that is only available for 1/2 of the population, the latter choice will be more 
effective for reducing A. 

Acknowledgements 

This work was supported by the Division of Mathematical Modeling at the UBC CDC under CIHR (grants 
no. MOP-81273 and PPR- 79231) and the EC Ministry of Health (Pandemic Preparedness Modeling Project), 
by DOE at LANL under Contract DE-AC52-06NA25396 and the DOE Office of ASCR program in Applied 
Mathematical Sciences, and by the RAPIDD program of the Science & Technology Directorate, Department 
of Homeland Security and the Fogarty International Center, National Institutes of Health. Luis M. A. 
Bettencourt contributed greatly to the early development of this work. I am grateful to Sara Y del Valle for 
providing the EpiSimS network data. 

A Epidemic Percolation Networks 

In this appendix, we describe the Epidemic Percolation Network (EPN), a tool that allows us to consider 
an epidemic as a static object rather than a dynamically changing process. This eases the understanding 
of certain key features and provides an improved technique to efficiently estimate V. EPNs have received 
moderate use recently [23l |22] |33] , and a precursor appeared in [26] . A sample EPN for an Erdos-Renyi 
network of average degree 3 and T = 0.4 is shown in figure [T3l 

Typically to estimate V in an SIR model many Monte Carlo simulations are performed. This process 
requires many iterations to have confidence in the results. Representative results from 500 such simulations 
are found in figure [TH Note that there is considerably more noise in the estimates of V than in the estimates 
of A. 

Instead we generate a single EPN E. We first assign X and S to each node and (if necessary) w to each 
edge! Then for each node u and neighbour v we calculate Tuv and place the directed edge (u^v) into E with 
probability Tuv The distribution of out-components of a given node is the same as for the final outbreak 
following an introduced infection of that node in the original epidemic model. 

^It is important that this assignment occur prior to infection [or at least independent of outbreak history]. If the infectiousness 
of V depends on the infectiousness of the node that infected v, then these results fail. This is the time- homogeneity assumption 
of [22] and is also used by |26]. Some effects that can occur when this assumption is false appear in |18| . 
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Figure 13: The underlying network for figure [T] and an EPN that leads to the same outbreak. Nodes in the 
Gscc are denoted by large circles, nodes in the Gin (but not in the Gscc) are denoted by pentagons, nodes 
in the Gout (but not in the Gscc) are denoted by triangles, and nodes not in any of these components are 
denoted by small circles. 
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Figure 14: V and A in an Erdos-Renyi network of 10^ nodes and (k) = 4. Theory (curves) compare well 
with results of 500 simulations (symbols). We take Tuv = 1 — e""-^^*^^, with distributions of X and S as 
given in table [TJ 

If the system is above the epidemic threshold, then E will (almost surely) have a giant strongly connected 
component Gscc [SlIIB]. We follow [13] and define the set of nodes (including Gscc) from which Gscc may 
be reached following the directed edges to be the giant in-component Gin- We symmetrically define Gout to 
be the set of nodes reachable from Gscc Note that Gscc = Gin H Gout- If the initial infection is in Gin^ an 
epidemic occurs, and all nodes in Gout become infected. Thus the size of Gin corresponds to the probability 
of an epidemic V and the size of Gout corresponds to the size of an epidemic A. This may be seen by 
comparing the EPN in figure [13] with the outbreak shown in figure 

Thus in the limit of large networks, epidemic probability is well- approximated by 7^ = ICinl/l^l while the 
fraction infected is well- approximated by ^ = \Gout\/\G\. This observation allows us to estimate V from a 
single EPN (figure [15]), rather than from hundreds of simulations (figure [H]). If the structure of the network 
is sufficiently random, the error in V and A from a single EPN is 0{\ogN/N) (see, e.g.^ |7|), and so in a 
large population a single simulation will provide a sufficiently good estimate. 

^It is possible that a small number of nodes outside of Gout are infected, but the proportion vanishes as IG] ^ oo. 
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Figure 15: Same as figure [TH but calculated through a single EPN for each T. The noise is substantially 
reduced in the V calculations, but slightly increased in the A calculations. 

B The basic reproductive ratio 

In this appendix we provide examples demonstrating the need for the more careful definition of TZo in 
section O and we explore properties of this definition. 

A pair of simple examples demonstrates the difficulties with the standard definition. In our first example, 
the standard definition suggests no epidemic is possible (T^o < 1), while in fact they are. In our second 
example, the standard definition suggests epidemics are possible (T^o > 1), while in fact they are not. 

For the first example, consider a fully-connected population of |G| ^ 1 nodes. We add 3|G| isolated nodes 
and consider a disease for which T = 3/\G\ . A node in the connected component will infect on average 3 
nodes, while an isolated node infects none. On average therefore, a random index case infects 0.75 other 
nodes. Under the standard definition IZq = 0.75 and epidemics should be impossible. However, if the index 
case is in the connected component, the introduction is likely to lead to an epidemic. 

Alternately, consider a population of |G| nodes with each node having three neighbours. For simplicity we 
assume no short cycles. Assume that a disease spreads with probability (1/3, 1/2) to a given neighbour. 
The average number of secondary infections caused by a single introduced infection is 3p > 1, giving T^o > 1 
under the standard definition. However, each secondary infection has only two susceptible neighbours, and 
so infects on average 2p < 1 neighbours, and the outbreak dies out. 

Some of these issues have been dealt with by [12] , who considered compartmental deterministic models of 
several types of individuals. At early time nonlinear terms are unimportant, and the profile of the infected 
population aligns with the eigenvector of the "next-generation" matrix. In stochastic settings, the same 
alignment occurs, but it may do so more quickly or slowly than predicted and for some realisations it 
may instead die out. To make a more rigorous definition of IZq^ we turn to statements about the average 
behaviour. We set 

_ E[Nr+l] 

- ~mV 

to be the ratio of the expected number of infections in rank r + 1 to the expected number in rank r. This 
value is affected by local small-scale structures. If the network is small, it is also affected by the finite size of 
the network, but if the network is large enough relative to r, we expect that the value will be unaffected by 
large-scale structure. In more concrete terms, the early growth of a disease in a neighbourhood is unaffected 
by whether that neighbourhood is part of a city of 100000, 1 million, or 10 million. As the disease spreads 
further, the effect of the finite city size will be noticeable for the smaller cities first. If the population is large 
enough, the ratio converges before the finite size has any impact. We define TZq mathematically as 

IZo = lim lim IZo r • (16) 
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Figure 16: A comparison of the convergence of 7^o,r, Rr^ and Rr for epidemics in the EpiSimS network 
(T = 0.075), an unclustered bimodal network (T = 0.3 with each node's degree coming either from a Poisson 
distribution peaked at 3 or a Poisson distribution peaked at 6), and an Erdos-Renyi network (T = 0.3, 
average degree 4). The calculations used 10^ simulations for each network. Note the difference in vertical 
scales. 



This definition is similar to that of [12], who used 

1Zo = hm sup hm sup Rr ^ (17) 

which is the limit as r — > oo of the geometric mean of 7^0,1? • • • ^^o,r-i (assuming the limit exists). This 
definition is more general and will converge in some cases where (p!6|) does not. However, if (p!6|) does converge 
(and typically we see that it does), then it reaches the same value, but does so sooner. So to clearly see TZq 
from (pT|) . we must have a larger network. 
Another suitable definition would be 

Rr =E[Nr^l/Nr] 

7^o hm hm Rr , (18) 

where the expectation is taken over realisations with Nr ^ 0. This will tend to require more steps to converge 
because it counts small outbreaks equally with large outbreaks, and so outbreaks which have not yet grown 
and are dominated by stochastic effects would be as important to the average as well-established epidemics. 

A comparison of these three definitions of TZq is shown in figure [161 They all result in similar values 
for TZq. For a clustered network, equation (p!6|) converges more quickly. For large unclustered networks, 
7^0, r = Rr and both converge to TZq at r = 1 while Rr takes longer. In an Erdos-Renyi network, all three 
definitions give 1Zo,r = T^o for all r; only noise due to insufficient simulations affects the calculation. 

To be fully rigorous, the |G| ^ oo limit must be appropriately defined. It does not make sense to talk 
about |G| ^ oo for a given network, and we cannot simply add nodes to the pre-existing network. We must 
take a sequence of networks in such a way that the small-scale structure is preserved, and as the network 
size grows, the size of the preserved structure increases. 

To make this rigorous, we follow [33j. Take a sequence of finite networks Gn, with |Gn| ^ oo as n ^ oo. 
We define Br to be the network induced on the set of nodes within distance r of a central node. The sequence 
of networks is taken so that the probability that the structure surrounding a randomly chosen central node 
is isomorphic to a given Br is the same for all Gn if n > r. This means that the small-scale structure in the 
different networks is the same, and the size of what is considered "small-scale" increases with n. 

We note that although the 1^1 ^ oo limit may be well-defined, it is possible that the r ^ oo limit in 
([E]) does not converge. This may occur because, for example, growth within a neighbourhood may happen 
at one rate, while spread between neighbourhoods in a suburb may happen at another, and spread between 
suburbs in a city may happen at yet another. If the rate of spread continues to change as the grouping size 
changes, then the r ^ oo limit may not exist. An effect analogous to this may appear in [J which considered 
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Figure 17: A comparison of the convergence of 7^o,r, Rr^ and Rr for the same networks and conditions as 
in figure [T6l except using a single EPN for each data point with 10^ different index cases within that EPN 
rather than 10^ distinct simulations [actually at submission the first two plots used only 10^ index cases, 
more will be added when calculations complete] . 



disease spread in Italy. Two distinct growth rates are seen depending on whether the disease is spreading in 
the general country or in Rome. 

Finally, it is possible to estimate IZq using a single EPN rather than multiple simulations. This is a much 
faster process, and so it is possible to do many more simulations to reduce the noise. However, because it 
is chosen from a single EPN, there may be a small systematic error. In figure [T71 we plot the values for the 
same conditions as in figure [TH We use the same number of index cases and so the noise is comparable. 
The value of IZq is not noticably affected by choosing a single EPN rather than multiple simulations. In the 
calculations in the paper, we have used a single EPN rather than multiple simulations. 



C Epidemics in Configuration Model Networks 

We briefly review previous work for epidemic spread in CM networks. These are the simplest networks to 
investigate, and so the theory has been developed further than for other networks f3l [36[ [3Ql [23l l32[ [39l l28]. 
See [33^ for some discussion of more arbitrary unclustered networks^ We extend the earlier theory by 
allowing independently assigned edge weig htsi 



C.0.1 The basic reproductive ratio 

Early in the spread of an infectious disease on a CM network, the probability of a node becoming infected is 
proportional to its degree, and so the pdf for the degree of infected nodes is kP{k) / (k). We choose an infected 
node u with degree k uniformly from nodes of rank r. If the network is large enough that we can ignore short 
cycles, then all of ix's neighbours are susceptible except the node which infected u. Thus u may infect up 
to k — 1 neighbours. The probability Toutiu) that u will infect a randomly chosen neighbour is chosen from 
Q out (Tout) 1 s-nd so the probability u infects exactly j ^ k — 1 neig hbours is {^-^)Tout{uy[l-Tout{u)]^-^-^- 
Integrating this over possible values of Tout and summing over k and j, we find that for r > the rank 
reproductive ratio is 




^0.. = 7^ >: I kP{k) / j ^yU^ - Tout)'-'-'P{Tout)dTout j = (T) i^^^ 



j=o 



and so 



TZo = (T) (19) 



^Perhaps the most significant result for non-CM networks is that if the higher degree nodes preferentially contact other high 
degree nodes, then the threshold transmissibility for an epidemic is reduced. 

^If edge weights are not assigned independently, then infection along different edges is not independent, and the methods of 
this section do not apply. 
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Thus we find that for CM network^ 7^o + 7^o,o = (T) (^). 
CO. 2 Probability and size 

We look for the probabihty that a single infected node causes a chain of infections leading to an epidemic. 
Because interchanging edge direction in an EPN interchanges V and we may focus on calculating V . 
Equivalent techniques replacing T^^^ by T^^ below give A. Our analysis is performed in the infinite network 
limit. 

We set / to be the probability a randomly chosen index case does not start an epidemic. We find 



where h is the probability a randomly chosen secondary case does not start an epidemic. The value of h 
satisfies the recurrence relation 



If 7^0 < 1, the trivial solution f = h = 1 is the only solution. For T^o > 1 an additional solution appears and 
is the physically relevant root. From this we can calculate V = 1 — f . 

Note that V depends on the distribution of Tout^ but is not affected by the distribution of Tin. Similarly 
A depends on the distribution of Tin but is not affected by the distribution of Tout- This result holds for 
unclustered, but not for clustered, networks. 

CO. 3 Summary 

We have shown that for CM networks, TZq = (T) (/c^ ~ / (k)- In particular it depends only on the 
network properties and the average transmissibility. In contrast, the probability V and size A are affected 
by the details of the distribution. Intuitively, this is easy to understand. For example, if we consider A in 
populations with varying T^^, at early times the rate of growth is governed by the average number of new 
infections created, which depends on the average transmissibility. However, a disproportionate number of 
highly susceptible nodes are infected, and so the average Tin of remaining nodes drops. By the end of the 
epidemic nodes are much harder to infect than they would have been if all were equally susceptible initially, 
and so the epidemic infects fewer people. 

A consequence of this is that we cannot predict A based only on the early growth rate. Although 
this is frequently done (see for example [27] and references therein), these calculations usually assume that 
the population is homogeneously susceptible, which is not always the case, particularly when a vaccine or 
previous exposure to similar diseases exists. 



We consider a network produced by EpiSimS for Portland, Oregon [HI [151 [5]. This simulation uses Census 
data, road structure, building locations, and population surveys to construct a virtual population that travels 
through the city. From the activity of individuals in the simulation, we may reconstruct who was in contact 
with whom and for how long. 

There are 1615860 nodes in the network, of which 1591010 are in the giant component. The average 
degree is approximately 16, and the average squared degree is approximately 359. The degree distribution 
has an exponential tail, and clustering is concentrated in the low-degree nodes. For our approximations of 
7^0 5 we also need information about length 2 paths. We calculate the number of pairs of nodes with each 

^Unless the degree distribution satisfies (/c^ — = {k)'^. The best-known such networks are Erdos-Renyi networks which 
have a Poisson degree distribution in the hmit of large network size. 




k 




D The EpiSimS Network 
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Figure 18: Properties of the EpiSimS network. For the final plot, contact times are binned in quarter hour 
increments, but exact values were used in calculations. 



value of Uuv for which Xuv = and Xuv = 1- Large values of Uuv are more frequent when Xuv = 1- The 
distribution of edge weights is fairly broad. Many contacts are very short, but the number of long contacts 
is not negligible. 
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